Multi-Factor Duplicate Question Detection in Stack Overflow
نویسندگان
چکیده
منابع مشابه
StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow
Stack Overflow (SO) has been a great source of natural language questions and their code solutions (i.e., question-code pairs), which are critical for many tasks including code retrieval and annotation. In most existing research, question-code pairs were collected heuristically and tend to have low quality. In this paper, we investigate a new problem of systematically mining question-code pairs...
متن کاملDuplicate Question Pair Detection with Deep Learning
Determining whether two questions are asking the same thing can be challenging, as word choice and sentence structure can vary significantly. Traditional natural language processing techniques such as shingling have been found to have limited success in separating related question from duplicate questions. Using a dataset of 400,000 labeled question pairs provided by question-and-answer forum Q...
متن کاملStack Overflow Query Outcome Prediction
Stack Overflow’s core mission is to create an online encyclopedia for all programming knowledge. In order to ensure quality content in the face of rapid growth, community moderators frequently close low quality questions, often asked by newcomers. In order to alleviate moderator burden and ease newcomers’ transition, we devise two classifiers to predict 1) whether a question will be closed and ...
متن کاملWays of Asking and Replying in Duplicate Question Detection
This paper presents the results of systematic experimentation on the impact in duplicate question detection of different types of questions across both a number of established approaches and a novel, superior one used to address this language processing task. This study permits to gain a novel insight on the different levels of robustness of the diverse detection methods with respect to differe...
متن کاملCASE-QA: Context and Syntax embeddings for Question Answering On Stack Overflow
Question answering (QA) systems rely on both knowledge bases and unstructured text corpora. Domain-specific QA presents a unique challenge, since relevant knowledge bases are often lacking and unstructured text is difficult to query and parse. This project focuses on the QUASAR-S dataset (Dhingra et al., 2017) constructed from the community QA site Stack Overflow. QUASAR-S consists of Cloze-sty...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computer Science and Technology
سال: 2015
ISSN: 1000-9000,1860-4749
DOI: 10.1007/s11390-015-1576-4